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ABSTRACT 

The  nature  of  speech  sounds  is  studied  with  particular  emphasis 
on  the  information  bearing  elements  of  speech.   The  association  of  the 
amplitude   clipped-speech  zero-crossing  rate,  formant  frequencies  and 
information  content  of  a  speech  signal  is  presented  and  capitalized  upon 
to  produce  readily  extractable  first  and  second  formants  from  the  speech 
wave. 

Various  methods  of  processing  the  formants  to  generate  unique 
patterns  for  particular  sounds  are  attempted,  with  a  time  plot  of  the 
arithmetic  difference  of  the  two  formants  being  explored  in  detail. 
The  object  being  to  obtain  machine  recognition  of  speech. 

Control  Data  Corporation  160  computer  machine  language  programs 
are  prepared  to  realize  an  Euclidean  comparison  of  spoken  numbers  zero 
to  nine  against  a  previously  stored  "dictionary."   Testing  showed  this 
type  processing  satisfactory  for  some  voices,  but  not  readily  extend- 
ible to  many  voices  with  the  same  "dictionary."  Methods  of  overcoming 
this  shortcoming  are  suggested. 
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1.  Introduction. 

In  the  present  age  of  scientific  discovery,  man  has  become  more 
and  more  dependent  on  the  use  of  electronic  computers.   The  future 
holds  in  store  even  more  use  of  these  devices  with  no  apparent  limit  in 
sight.   As  this  powerful  tool  becomes  more  universally  important  in 
man's  day  to  day  existence,  it  becomes  increasingly  more  aggravating 
that  he  has  to  speak  to  it  in  its  mode  of  communication,  paper  tape  or 
punch  cards;  and  not  in  his  own,  the  spoken  word.   Even  today,  at  the 
very  infancy  of  the  computer  age,  the  time  required  to  do  many  computa- 
tions is  less  than  the  time  required  to  instruct  the  machine  in  how  to 
do  them.   This  interface  problem  between  man  and  machine  promises  to 
become  even  more  severe  as  computers  become  more  sophisticated. 

All  this  points  to  the  need  of  a  method  of  achieving  machine  rec- 
ognition of  speech.   Much  work  is  now  being  done  on  this  problem,  but 
it  is  far  from  solved.   The  work  described  in  this  thesis  is  concerned 
with  an  approach  to  a  simplified  form  of  this  problem,  which  may  be  a 
stepping  stone  on  the  path  to  its  eventual  solution. 

2.  Nature  of  Speech  Sounds. 

In  order  to  obtain  insight  into  the  information  carrying  aspects 
of  speech,  it  is  well  to  study  the  nature  of  the  speech  producing 
process.   Speech  sounds  are  produced  by  modulations  forced  on  the  air 
stream  coming  out  from  the  lungs.   These  modulations  can  occur  first 
in  the  larynx,  the  first  valve  the  air  stream  meets  in  its  travel.   The 
larynx  is  made  up  of  bundles  of  muscle  fibers,  called  vocal  cords,  which 
can  be  brought  together  to  restrict  the  flow  of  air  or  to  stop  it 
completely.   To  produce  a  speech  sound  these  folds  are  brought  together 
to  stop  the  air  flow.   When  sufficient  pressure  is  built  up  behind  the 


closed  orifice  to  push  the  cords  apart,  a  puff  of  air  escapes-   The 
cords  then  close  until  pressure  again  forces  another  puff  out.   This 
process,  which  occurs  at  the  rate  of  a  few  hundred  times  a  second,  is 
called  phonation.   The  nature  of  this  sound  production  indicates  that 
it  is  very  much  different  from  a  pure  sinusoid,  perhaps  more  like  a 
triangular  wave  in  shape,  showing  that  harmonics  are  present  extending 
to  frequencies  much  higher  than  the  basic  rate  of  phonation.   By 
controlling  the  tension  on  the  vocal  cords  the  fundamental  frequency  of 
phonation  can  be  controlled. 

Following  the  larynx,  the  air  flow,  which  can  be  referred  to  as 
the  speech  wave,  passes  into  the  vocal  tract  where  the  major  part  of 
the  intelligence  to  be  transferred  by  the  speech  process  will  be  added. 
The  vocal  tract  consists  of  the  throat,  mouth,  and  nasal  cavity,  and 
the  process  by  which  these  cavities,  joined  with  the  lips,  tongue  and 
teeth  produce  the  desired  modulation  is  called  articulation.   To  under- 
stand the  effects  of  articulation  on  the  speech  wave  it  is  first  nec- 
essary to  define  the  various  types  of  speech  sounds,  since  the  effects 
of  articulation,  although  similar  in  nature,  are  different  in  principle 
of  information  processing:   sometimes  adding  information  to  a  sound 
wave,  sometimes  producing  the  sound  wave  itself. 

Speech  sounds  may  be  divided  into  two  classes  according  to  their 
origin  of  production:   voiced  sounds  if  they  are  produced  in  the  larynx 
as  described  above,  and  later  modified  by  the  articulation  process,  or 
unvoiced  sounds  if  they  are  produced  solely  in  the  organs  which  follow 
the  larynx.   If  the  frequency  spectrum  of  a  voiced  sound  were  plotted 
as  it  appears  out  of  the  larynx  and  before  any  articulation  has  occur- 
red, it  would  look  something  like  figure  1.   :  a  fundamental  frequency 
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Figure  1.  Typical  frequency  spectrum  of  a  laryngeal  tone  prior 
to  articulation. 
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figure  2.  Typical  resonance  pattern  produced  by  the  articulation 
organs . 
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Figure  3.   Laryngeal  tone  after  articulation. 

(corresponding  to  the  rate  of  vibration  of  the  vocal  cords)  occurring 
with  large  amplitude,  and  harmonics,  or  multiples  of  the  fundamental 
frequency,  occurring  with  decreasing  amplitude  with  increasing  fre- 
quency. 

In  the  process  of  articulation  on  the  voiced  wave,  the  organs  of 
the  vocal  tract  are  moved  into  different  positions  to  produce  various 
frequencies  of  resonance.   These  resonant  frequencies  serve  to  enhance 
the  amplitudes  of  those  frequency  components  of  the  voiced  sound  which 
fall  in  their  regions.   A  plot  of  these  frequency  resonances  might  look 
something  like  shown  in  Figure  2. 

The  combined  result  of  the  articulation  of  Figure  2  on  the  voiced 
wave  of  Figure  1. would  be  a  spectrum  of  amplitudes  occurring 
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predominantly  at  those  frequencies  where  the  resonance  humps  occurred, 
with  the  amplitude  of  each  hump  being  less  than  the  preceding  one  as 
frequency  increases  (Figure  3).   The  selection  of  three  resonance 
humps  for  this  example  was  no  accident,  since  this  is  precisely  the 
way  that  articulation  affects  the  voiced  sounds  in  most  instances. 
These  three  frequency  regions  where  most  of  the  amplitude,  and  hence 
energy,  is  located  are  called  formants,  and  their  location  in  the  fre- 
quency spectrum  is  felt  to  have  much  to  do  with  the  information  bearing 
mechanism  of  speech.   Much  more  will  be  said  of  formants  in  the  pages 
that  follow. 

Speech  sounds  which  occur  with  no  vibration  of  the  vocal  cords  are 
called  unvoiced  sounds.   It  is  apparent  that  a  frequency  spectrum  model 
of  such  a  sound  would  not  be  as  simple  as  for  the  voiced  sounds,  and 
this  is  a  problem  that  makes  speech  analysis  difficult.   No  one  model 
can  be  extended  to  all  the  various  speech  sounds.   Indeed,  we  have  only 
broken  speech  sounds  into  two  major  classifications  and  already  are 
unable  to  describe  them  with  a  single  model.   More  sub-classifications 
are  yet  to  come,  and  they  will  be  equally  elusive  when  it  comes  to  a 
common  basis  of  modeling.   It  is  to  be  noted  that  vowels,  all  of  which 
are  voiced  and  steady  state  type  sounds  in  nature,  fall  into  one  class, 
while  consonants,  which  are  more  transient  in  nature,  occur  in  both 
voiced  and  unvoiced  classes. 

Consonants  are  commonly  classified  according  to  the  way  they  are 
produced.   They  are  generally  divided  into  six  categories  as  follows: 
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Plosives  or  Stops  -  These  consonants  are  produced  by  a  stopping 
and  then  sudden  release  of  the  air.   The  stop  plosives  are  p,  b,  k, 
and  g. 

Continuents  -  Continuents,  unlike  plosives,  may  be  continued  or 
prolonged  during  a  breath.   They  are  further  sub-categorized  as  nasals, 
laterals,  and  fricatives.   Nasals  are  produced  by  stopping  the  air  in 
the  mouth  and  releasing  it  through  the  nostrils.   Laterals  are  produced 
by  placing  the  tip  of  the  tongue  on  the  upper  gum  ridge  and  releasing 
the  air  over  the  sides  of  the  tongue.   The  only  lateral  in  the  English 
language  is  1.   Fricatives  are  formed  by  forcing  the  air  through  a 
very  narrow  opening  in  the  articulation  organs.   The  fricatives  in 
English  are  f,  v,  th,  r,  h,  s,  z,  sh,  and  zh. 

Glides  -  Glides  are  characterized  by  a  continuous  movement  of  an 
articulation  organ  as  the  sound  is  produced.   The  glides  are  w  (we), 
wh  (when),  and  the  initial  sound  in  yes. 

Vowel  like  consonants  -  These  consonants  are  so  named  because  they 
have  some  of  the  characteristics  of  the  vowels.   They  are  w,  r,  1,  m, 
n,  ng,  and  y  as  in  yes. 

Glottal  sounds  -  Glottal  sounds  are  sounds  produced  in  the  glottis, 
the  opening  between  the  vocal  cords.   The  only  glottal  sound  in  English 
is  h. 

Affricatives  -  Affricative  sounds  are  plosives  followed  immediately 
by  fricatives.   The  affricatives  include  ch  and  j.   There  are  many  other 
ways  to  classify  speech  sounds  and  also  other  groups  within  the  classi- 
fication here  which  have  not  been  included.   This  breakdown  is  not 
meant  to  be  exhaustive,  but  rather  only  complete  enough  to  make  the 
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reader  aware  of  the  definitions  of  these  terms  as  used  in  this  paper, 
and  the  degree  of  difference  in  speech  sounds  this  researcher  had 
assumed  in  undertaking  the  work  described  later. 
3.    The  Spectrograph. 

In  trying  to  machine  recognize  speech,  the  problem  is  one  of 
finding  elements  or  parameters  of  the  spoken  word  that  are  uniquely 
characteristic  of  that  word  and  no  other.   Individual  speaker  charac- 
teristics can  be  regarded  as  noise  and  not  of  interest.   To  be  sure, 
emphasis,  timing  and  so  on,  can  affect  the  meaning  of  what  is  being 
said,  but  at  this  stage  in  the  development  of  speech  recognizers,  the 
simpler  problem  is  sufficiently  difficult  to  warrant  study. 

There  has  been  much  work  done  in  the  area  of  trying  to  extract  the 
informational  content  of  speech  from  the  "noisy"  form  in  which  it 
appears  from  the  speaker's  mouth.   In  particular,  the  objective  has 
been  twofold:   To  reduce  the  bandwidth  required  to  transmit  the  infor- 
mation, and  to  provide  a  visual  presentation  of  the  information.   The 
former  has  resulted  in  various  kinds  of  vocoders  such  as  formant  vocod- 
ers, correlation  vocoders,  fixed  channel  vocoders,  and  hybrid  combina- 
tions of  these  (12).   The  latter  work  has  been  chiefly  concerned  with 
the  sound  spectrograph  (21).   The  spectrograph  is  of  particular  inter- 
est to  this  work  and  so  a  brief  discussion  of  it  follows. 

The  sound  spectrograph  was  first  presented  in  the  literature  in 
"Science,"  November  1945.   It  is  essentially  a  device  for  making  paper 
strip  recordings  of  frequency  and  intensity  versus  time  for  short  sound 
samples.   The  recordings  are  so  made  that  the  variations  of  vocal 
resonances  (formants)  with  time  are  displayed  conspicuously.   Spectro- 
graph recordings  of  vowels  (Ref.  21)  show  the  formants  as  well  defined 
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bars  at  specific  frequencies.  For  the  consonants  the  only  way  to 
determine  where  their  formants  are  is  to  see  where  the  lines  came 
from  in  transitioning  into  the  vowels. 

By  studying  a  few  such  spectrograph  recordings,  one  can  see  that 
the  locations  of  these  formant  lines  is  a  characteristic  of  a  parti- 
cular sound.   In  particular,  the  second  formant  bar  is  considered  to 
be  of  special  importance.   It  would  then  be  reasonable  to  seek  to  use 
this  formant  information  for  a  speech  recognition  scheme  except  for  the 
problem  of  identifying  the  consonant  formant  frequencies.   The  spectro- 
graph has  presented  vividly  the  importance  of  the  formants  in  finding 
the  information  carrying  elements  of  speech,  but  just  has  not  pro- 
vided a  means  of  getting  a  hand  on  all  these  formants  easily. 
4.    Nature  of  Clipped  Speech. 

If  a  speech  wave  is  viewed  in  the  time  domain,  that  is,  a  plot  of 
amplitude  versus  time,  an  obvious  characteristic  is  the  great  dynamic 
range  of  amplitudes  that  are  present.   Variations  of  up  to  60  db  are 
not  uncommon  in  normal  speech.   In  particular,  it  is  noted  that  vowels 
are  on  the  average  12  to  28  db  higher  than  the  consonants.   This  wide 
dynamic  range  of  normal  speech  presents  problems  in  speech  processing 
for  transmission,  since  a  transmitting  system  would  have  to  work  at  a 
very  low  average  power  (and,  of  course,  lower  range)  if  the  exact  shape 
of  the  speech  wave  were  to  be  preserved.   In  order  to  increase  the 
average  power,  work  has  been  done  in  the  area  of  speech  clipping.   The 
approach  to  this  problem  was  to  see  how  much  peak  clipping  could  be 
accomplished  without  distorting  the  signal  beyond  comprehensibility . 

It  has  been  found  by  various  researchers  that  clipping  the  orig- 
inal speech  waveform  up  to  12  db  has  no  noticeable  effect  on  the  quality 
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and  intelligibility.   Clipping  of  about  12  db  sounds  as  if  the  speaker 
were  enunciating  carefully. 

The  improvement  in  intelligibility  for  12  db  of  clipping  is  some- 
what surprising  at  first  since  the  speech  wave  has  definitely  been 
distorted  considerably,  and  one  would  expect  degradation  in  performance. 
Actually,  by  reducing  the  peaks  which  are  primarily  associated  with 
the  vowels,  the  process  serves  to  enhance  the  relative  power  in  the 
consonants.   Since  the  consonants  are  much  more  transitory  in  nature 
than  the  vowels,  it  is  appealling  to  say  from  an  information  theory 
point  of  view  that  they  are  the  primary  information  bearing  elements 
in  the  signal,  and  to  increase  their  relative  power  is  to  increase  the 
emphasis  on  the  information  content  of  the  speech  wave  (22).   It  also 
follows  from  this  that  the  individual  speaker  characteristics  are 
contained  more  in  the  vowels  than  in  the  consonants.   One  would,  there- 
fore, expect  clipped  speech  to  be  somewhat  less  indicative  of  speaker 
voice  traits,  and  this  is  an  experimentally  proven  fact. 

Pushing  the  concept  of  clipped  speech  to  the  absolute  limit,  a 
group  at  Harvard  University  studied  the  effects  of  infinitely  clipped 
speech.   Infinitely  clipped  speech  being  produced  by  clipping,  ampli- 
fying, and  reclipping  until  the  only  information  contained  in  the 
processed  wave  is  the  places  of  time  axis  crossing,  referred  to  as  zero- 
crossings.   An  example  of  such  an  infinitely  clipped  waveform  is  shown 
in  Figure  4. 

From  Figure  4  it  can  be  seen  that  the  amplitude  information  has 
been  totally  removed  from  the  speech  wave.   It  was  found  that  despite 
this  severe  distortion,  the  clipped  wave  was  90%  intelligible  in  the 
absence  of  noise.   By  differentiating  the  original  speech  wave  prior 
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(a) 


(b) 


(c) 


Figure  4.   (a)  Original  waveform;  (b)  After  infinite  clipping; 
(c)  Differentiated  prior  to  infinite  clipping. 
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to  clipping,  that  is,  producing  an  infinitely  clipped  wave  whose 
zero-crossings  correspond  to  the  maxima  and  minima  of  the  original 
wave,  a  95%  intelligibility  was  noted.   This  seems  to  indicate  that 
a  greater  amount  of  information  is  contained  in  the  higher  frequency 
components,  which  are  brought  into  the  forefront  when  emphasized  by 
the  six  db  per  octave  increasing  amplification  of  higher  frequencies 
produced  by  the  differentiating  action. 

The  reason  that  clipped  speech  is  still  intelligible  can  be  fur- 
ther seen  from  a  frequency  spectrum  point  of  view.   It  is  a  basic  fact 
of  nature  which  is  easily  proven  that  the  human  ear  is  quite  insensi- 
tive to  phase.   It  has  been  common  to  assert  that  the  information  con- 
tent of  a  speech  wave  is  contained  in  the  energy  spectrum  of  its 
various  frequency  components.   If  the  relative  phases  of  these  compo- 
nents are  varied,  within  limits,  thereby  producing  a  wholly  different 
amplitude  versus  time  pattern,  the  ear  would  notice  no  difference. 

The  importance  of  clipped  speech  to  the  work  undertaken  in  this 
thesis  is  the  relationship  of  its  zero-crossing  rate  to  the  formants 
of  the  speech  sounds.   Chang,  Phil  and  Essigmann  in  their  paper  "Repre- 
sentations of  Speech  Sounds  and  some  of  Their  Statistical  Properties" 
(5)  have  demonstrated  mathematically  that  the  average  rate  of  zero-  ' 
crossing  of  the  undifferentiated  speech  wave  is  very  nearly  a  measure 
of  the  first  formant  frequency.   Furthermore,  the  average  rate  of  zero- 
crossing  of  the  differentiated  wave  is  a  measure  of  the  second  formant 
frequency. 

From  the  above  interrelation  of  clipped  speech  and  the  first  two 
formants,  which  are  strongly  believed  to  be  the  information  bearing 
elements  of  the  speech  wave,  it  is  hypothesized  that  equipment  could 
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be  designed  to  obtain  the  formant  frequencies  via  the  clipped  speech 
zero-crossing  rate.   These  formant  frequencies  could  then  be  used 
together,  or  perhaps  with  the  assistance  of  some  other  speech  param- 
eters to  distinguish  spoken  words  for  at  least  a  limited  vocabulary 
and  for  a  variety  of  speakers.   This  hypothesis  is  based  on  the  ap- 
pealing assumptions  that  the  formant  frequencies  do  contain  the  infor- 
mation and  that  the  individual  speaker  characteristics  can  be  elimir  :  \ 
nated  by  going  to  the  formants  via  the  clipped  speech  zero-crossing 
approach.   This  researcher  has  found  no  indications  in  the  literature, 
save  that  discussed  below  (22),  to  indicate  that  anyone  else  has 
attempted  to  verify  Chang's  mathematical  conclusions  for  speech  sounds. 
The  work  below  pointed  out  some  distinct  possibilities,  and  it  is  from 
there  that  the  work  for  this  thesis  began. 
5.    Syllables  Versus  Phonemes  for  Speech  Recognition  Schemes. 

In  the  initial  phases  of  designing  a  scheme  for  the  automatic 
recognition  of  speech,  the  question  of  how  large  a  speech  segment  is 
to  be  analyzed  at  a  time  has  to  be  considered.   It  has  been  suggested 
by  several  researchers  and  institutions  engaged  in  work  on  this  problem 
that  the  logical  approach  is  to  go  to  phoneme  recognition,  a  phoneme 
being  the  smallest  element  of  a  speech  sound.   Since  there  are  only  40 
phonemes  in  the  English  language,  this  would  lead  to  a  minimal  stored 
dictionary,  and  would  be  capable  of  responding  to  any  word,  even  those 
that  are  not  in  existence  at  the  time  of  the  design  of  the  device. 
However,  the  problems  associated  with  analyzing  an  utterance  as  small 
as  a  phoneme  make  such  a  seemingly  optimal  approach  difficult  to  imple- 
ment.  To  be  sure,  some  successes  have  been  achieved  in  this  method, 
notably  by  the  Radio  Corporation  of  America  in  their  work  on  speech 
recognition  for  the  Air  Force. 


18 


Their  work  was  directed  towards  the  regions  of  decreasing  and 
increasing  spectral  energy,  rather  than  the  energy  peaks  (formants) 
themselves.   These  features  were  found  to  be  more  easily  abstracted 
and  more  invariant  for  their  processing  method.   The  use  of  phonemes 
was  found  to  be  satisfactory  in  their  study  of  segmented  speech.   It  is 
pointed  out,  however,  that  for  continuous  speech  some  provisions  will 
have  to  be  made  for  the  changes  that  occur  in  the  sound  of  phonemes 
caused  by  the  neighboring  sounds. 

An  equally  large  group  has  espoused  the  syllabic  approach  to  the 
recognition  problem,  a  syllable  in  this  sense  not  necessarily  meaning 
the  same  thing  as  a  syllable  in  grammar.   Estimates  of  the  number  of 
different  syllables  needed  in  a  dictionary  to  adequately  cover  the 
English  language  run  from  1000  to  2000,  with  those  who  embrace  the 
phoneme  approach  voicing  the  latter  figure.   The  actually  needed  number 
probably  lies  somewhere  between,  but  it  would  seem  that  something  con- 
siderably more  limited  could  be  used  for  most  applications  if  and  when 
a  method  is  perfected.   The  phonetic  typewriter  developed  by  RCA  Labo- 
ratories (18)  is  an  example  of  a  working  model  using  syllabic  recogni- 
tion successfully  for  a  vocabulary  of  100  syllables.   This  system 
operates  on  an  input  of  syllables  or  monosyllabic  words  spoken  one  at 
a  time.   These  utterances  are  then  normalized  and  their  frequency 
spectra  extracted  by  banks  of  filters  for  comparison  with  previously 
stored  "dictionary"  spectra.   The  authors  make  the  point  that  the  syl- 
labic approach  was  chosen  for  their  work  because  the  sounds  of  the 
various  phonemes  have  different  characteristics  when  taken  out  of  con- 
text, and  are  thus  not  felt  to  be  a  reliable  indicator  of  the  informa- 
tion in  themselves,  but  only  as  they  exist  in  the  syllables. 
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To  be  sure,  this  syllabic  approach  is  also  a  simplification 
of  the  overall  problem  since  they  themselves  are  known  to  be  affected 
by  the  sounds  that  precede  and  follow.   Any  speech  plan  that  does 
not  set  its  sights  on  the  problem  of  recognizing  sounds  as  they  occur 
in  connected  speech  is  never  going  to  be  a  completely  satisfactory  all 
word  recognizer.   To  solve  the  problem  in  any  other  form  is  to  deal 
with  it  out  of  its  natural  environment,  and  hardly  extrapolative  into 
the  more  general  case.   However,  the  problem  is  complex  enough  at  this 
stage  of  its  study  to  warrant  much  more  work  on  special  situation  type 
considerations  until  more  is  learned  of  the  information  carrying  modes 
of  speech.   The  investigation  conducted  by  this  student  has  encompass- 
ed just  such  a  limited  approach  to  the  wider  problem  by  restricting 
the  study  to  monosyllabic  words  with  a  few  minor  exceptions. 

The  general  aim  of  this  work  was  to  investigate  experimentally 
the  formant  zero-crossing  association  discussed  by  Chang,  et  al  (5) 
and  to  explore  the  possibility  of  using  these  parameters  alone  or 
with  others  to  achieve  patterns  or  matrices  independent  of  individual 
speaker  characteristics  and  highly  indicative  of  the  word  being  spoken. 
In  the  event  that  the  parameters  obtained  were  not  suitable  for  this 
objective,  it  was  proposed  that  the  methods  planned  be  used  for  ana- 
lyzing individual  utterances  such  as  fricatives,  plosives,  and  so  on 
to  determine  if  there  is  any  correlation  between  data  for  just  some 
such  particular  sounds.   It  is  possible  that  the  information  contained 
in  the  formants,  or,  if  you  will,  in  the  zero-crossing  rate  is  only 
derived  from  particular  articulations  and  not  from  all.   The  informa- 
tion obtained  in  the  literature  by  this  researcher  indicates  that  there 
are  no  real  definitive  answers  available  in  this  regard  and  it  is 
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unknown  as  to  just  what  are  the  most  important  information  bearing 
elements  in  the  speech  communications  system.   The  systems  of  speech 
analysis  and  synthesis  now  existing  have  gained  what  successes  that 
they  have  not  so  much  from  an  application  of  scientifically  applied 
knowledge,  as  from  an  engineering  trade-off  of  bandwidth  for  a  con- 
glomeration of  other  characteristics  of  the  speech  wave,  which  somehow, 
through  the  benevolence  of  a  sympathetic  diety,  has  worked.   The  work 
described  herein  is  to  be  considered  as  just  one  more  such  flail  at 
this  elusive  problem. 
6.    The  Vector  Display. 

In  a  report  on  a  government  sponsored  research  effort  on  signal 
processing  by  infinite  clipping  conducted  at  Georgia  Institute  of 
Technology  in  late  1963  and  early  1964,  B.O.  Pyron  and  F.R.  Williamson, 
jr.  discussed  a  vector  display  unit  which  they  had  developed  for 
visually  displaying  voice  and  other  short  time,  highly  transient  sig- 
nals.  They  found  that  an  analog  signal  proportional  to  the  short-time 
running  average  of  the  zero-crossing  rate  of  the  original  or  differen- 
tiated speech  wave  was  quite  similar  for  the  same  sound  by  many  speak- 
ers and  distinctly  different  for  other  sounds.   Another  analog  signal 
was  produced  proportional  to  the  smooth  envelope  of  the  amplitude  of 
the  original  waveform  and  used  as  a  second  coordinate  for  an  oscillo- 
scope display.   This  display,  called  a  vector  display  by  the  origina- 
tors, consisted  of  the  averaged  zero-crossing  analog  applied  to  the 
vertical  deflection  plates  and  the  amplitude  analog  applied  to  the 
horizontal  plates  of  a  storage  type  oscilloscope. 

The  authors  reported  the  patterns  produced  by  the  vector  display 
had  a  tendency  to  correlate  well  for  spoken  words  and  seemed 
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independent  of  individual  voice  characteristics.   In  particular,  the 
patterns  produced  using  the  differentiated  waveform  seemed  to  give  the 
most  distinctive  shapes.   This  is  plausible  since  the  differentiated 
waveform  is  felt  to  carry  more  intelligence  than  the  original  for 
infinitely  clipped  speech. 

As  a  beginning  point  in  this  thesis,  the  circuitry  discussed  in 
Pyron  and  Williamson's  paper  was  constructed  and  their  vector  display 
studied.   The  circuitry  used  was  exactly  as  presented  in  their  report 
with  the  exception  of  minor  corrections  of  obvious  typographical  errors. 
This  circuitry  is  presented  and  discussed  in  Appendix  I. 

Patterns  produced  were  similar  to  those  in  the  reference.   Uti- 
lizing a  Hughs  Memoscope  to  hold  the  highly  transient  characteristics 
of  the  analogs  for  study,  patterns  were  generated  for  the  numbers 
zero  through  nine  by  several  male  speakers.   The  objective  being  a 
series  of  distinctly  different  patterns  for  different  numbers,  but 
reasonably  alike  for  the  same  number  by  various  speakers.   If  such 
could  be  achieved,  the  ultimate  objective  being  to  use  a  digital  com- 
puter for  recognizing  the  patterns  as  the  numbers  they  represent. 

In  working  with  the  vector  display  it  was  noted  (as  reported  by 
the  originators)  that  the  patterns  were  greatly  affected  by  channel 
gain,  bandwidth,  and  the  time  constant  of  the  output  low-pass  filter. 
Therefore,  in  comparing  patterns  from  day  to  day  it  became  very  impor- 
tant to  insure  that  the  precise  same  conditions  existed  for  all  the 
subjects  in  question.   In  particular,  the  level  of  amplitude  of  a 
speaker's  voice  at  the  microphone  was  most  difficult  to  control,  and 
this  was  noted  to  have  an  adverse  effect  on  some  patterns.   However, 
such  speaker  voice  power  variations  will  have  to  be  allowed  for  in  any 
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practical  system,  and  it  is  felt  the  amount  of  precaution  taken  here 
was  sufficient  to  give  the  vector  display  a  fair  opportunity  to  prove 
itself  up  to  the  requirements  of  a  machine  recognizor  of  speech. 
Unfortunately,  the  results  obtained  in  this  study  did  not  show  this 
type  pattern  either  unique  enough  for  individual  sounds  nor  consistent 
enough  for  the  same  sound  by  various  speakers.   To  be  sure,  some 
sounds  do  have  quite  distinctive  features,  in  particular  those  contain- 
ing plosives  or  fricatives  such  as  ship  or  tooth,  but  there  seemed  to 
be  too  many  exceptions  to  make  such  a  system  workable. 

A  more  recent  paper  on  infinitely  clipped  speech  by  W.A.  Ainsworth 
(1)  pointed  out  that  clipping  systems  which  do  not  maintain  a  distinc- 
tion between  the  polarity  of  zero-crossings  provide  less  information 
than  those  which  do,  since  to  measure  the  frequency  of  zero-crossing 
in  both  directions  is  to  measure  the  even  harmonics  of  the  wave  only, 
thereby  producing  an  harmonically  distorted  output.   Intelligibility 
tests  showed  at  least  a  20%  increase  in  intelligibility  achieved  by 
marking  only  the  zero-crossings  in  one  direction  with  pulses  of  the 
monostable  multivibrator  (See  Appendix  I). 

With  Ainsworth' s  results  in  mind  and  experience  gained  with  the 
vector  display,  new  circuitry  was  constructed  to  generate  the  analog 
of  the  zero-crossing  rate  of  both  the  differentiated  and  original 
speech  waveforms.   A  block  diagram  of  this  circuitry  is  shown  in  Figure 
5  with  the  actual  circuits  outlined  and  their  operation  discussed  in 
Appendix  II. 

Results  obtained  from  this  processing  were  somewhat  more  encourag- 
ing, but  not  enough  to  change  the  original  opinion  of  this  type  display, 
Various  input  bandwidth  and  output  low-pass  filter  time  constant  values 
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Figure  5.  Block  diagram  of  circuits  used  to  generate  analog  of  zero- 
crossing  rate  of  original  and  differentiated  speech  signal. 
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were  tried  in  the  hope  that  the  variations  thereby  produced  in  the 
patterns  would  be  more  severe  in  some  than  in  all,  and  make  them 
distinctive  enough  for  further  study.   It  was  concluded  on  the  basis 
of  several  days  work  in  this  approach  that  no  real  progress  was  being 
achieved  and  the  vector  display  was  abandoned. 
7.    Formant  one  versus  Formant  two  versus  Time. 

In  the  course  of  working  with  the  vector  display  the  idea  suggest- 
ed itself  that  a  display  of  averaged  zero-crossing  rate  of  the  original 
waveform  versus  that  of  the  differentiated  wave  should  be  of  greater 
interest.   This  type  display  was  appealing  for  the  following  reasons: 

1)  It  would  be  a  pattern  defined  by  the  first  and  second  formants 
of  the  speech  wave  (as  derived  theoretically  and  demonstrated 
experimentally  by  Chang,  et  al  in  Ref.  5)  which  are  held  to 

be  the  information  carrying  elements  of  the  speech  wave. 

2)  It  would  eliminate  the  amplitude  parameter  from  the  somewhat 
promising  vector  display,  a  parameter  whose  phase  dependence 
made  its  value  suspect  from  the  very  beginning  of  this  work. 

Memoscope  displays  of  first  formant  versus  second  formant  were  next 
generated  for  the  numbers  zero  to  nine  by  several  male  speakers.   It 
became  immediately  obvious  that  this  type  pattern,  although  most 
promising  in  theory,  left  a  trace  that  was  too  confusing  for  worthwhile 
analysis.   It  was  apparent  that  if  anything  useful  was  to  be  obtained 
from  this  combination,  another  parameter  would  have  to  be  included  to 
spread  the  formant  versus  formant  excursions  of  the  trace  out  more  from 
the  origin  of  the  axes. 

Time  was  the  obvious  other  parameter  chosen  and  it  was  included  in 
the  present  plot  by  applying  a  time  sweep  to  both  axes  of  the  memoscope, 
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producing  a  time  sweep  diagonally  rising  across  the  scope.   The 
circuitry  used  to  achieve  this  is  shown  in  Figure  6.   The  passage 
of  the  formant  analogs  through  the  amplifier  stage  in  the  circuit  of 
Figure  6  caused  180   phase  shifts  and  the  resultant  three  dimensional 
plot  is  now  as  shown  in  Figure  7. 

Patterns  obtained  for  this  type  processing  were  very  promising. 
With  no  bandlimiting  on  the  input  waveform,  patterns  were  for  a  given 
speech  sound  very  consistent  for  a  variety  of  speakers.   For  the  num- 
bers zero  to  nine  there  was  a  need  for  more  individuality  in  some  of 
the  patterns,  especially  those  not  containing  plosives,  fricatives, 
or  stop  consonants.   Bandlimiting  of  the  input  wave  to  either  or  both 
channels  offered  possibilities  of  improvement,  as  did  asymmetrical 
weighting  of  the  formant  channels.   The  second  formant  is  felt  by  many 
speech  researchers  to  be  the  principal  carrier  of  information  and  so 
it  seems  reasonable  to  give  it  more  emphasis  in  this  kind  of  plot. 
More  work  was  not  done  with  this  type  display  because  the  simpler  ap- 
proach to  be  discussed  next  gave  much  more  interesting  results  at  the 
same  level  of  investigation.   Typical  pictures  of  the  traces  obtained 
with  the  display  just  mentioned  are  shown  in  Appendix  III. 
8.   Formant  two  minus  Formant  one  versus  Time. 

While  working  with  the  processing  method  just  discussed,  it  became 
apparent  to  me  that  the  display  being  studied  was  a  vector  sum  of  the 
two  formants  with  time  (not  mutually  orthogonal  vectors).   A  simple 
arithmetic  difference  type  process  had  been  overlooked  and  with  no 
justifiable  reason.   Such  a  combination  would  have  the  effect  of  can- 
celing the  similar  portions  of  the  formants,  that  is,  the  portions 
that  are  similar  at  the  same  time.   Since  the  Hughes  Memoscope  utilized 
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Figure  6,  Circuit  used  for  generating  formant  one  versus  formant 
two  versus  time  plot. 
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two  versus  time. 
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had  a  high  gain  differential  preamplifier  available  as  a  plug-in  unit, 
it  was  no  problem  to  implement  such  a  display. 

Patterns  were  generated  and  studied  for  the  numbers  zero  through 
nine  by  three  male  speakers.  It  was  noted  that  this  type  pattern  had 
to  a  fair  degree  the  desired  simplicity  and  uniqueness  needed  for  the 
goal  of  machine  recognition  of  speech.  The  patterns  showed  very  good 
consistency  for  a  given  sound  by  various  speakers,  and,  although  not 
absolute  uniqueness  for  different  sounds,  enough  variance  to  make  more 
work  here  feasible.  Typical  patterns  for  this  kind  of  processing  are 
shown  in  Appendix  IV. 

Rather  than  rely  on  visual  consideration  as  done  previously,  it 
was  decided  at  this  stage  of  the  investigation  to  feed  the  analogs 
being  generated  into  a  computer  for  comparison.   Since  such  work  would 
be  best  accomplished  in  a  real  time  environment,  both  from  the  point  of 
view  of  study  and  as  an  ultimate  machine  recognition  capability,  it  was 
decided  to  do  this  work  on  a  small,  but  more  readily  available  computer, 
the  Control  Data  160.   To  be  sure,  forsaking  the  capabilities  of.  the 
larger  computers  available  here  at  the  Naval  Postgraduate  School,  the 
IBM  360  and  the  SDS  930,  required  greater  effort  in  programming  and 
provided  a  lesser  degree  of  potential  operations.   However,  for  testing 
and  evaluating  a  system  such  as  this  the  advantage  of  working  in  a  real 
time  situation  cannot  be  overestimated  *   Also,  the  value  of  any  pro- 
cessing scheme  is  increased  if  the  amount  of  computer  capacity  needed 
is  held  to  a  minimum,  an  inherent  requirement  here. 

In  selecting  a  method  of  pattern  comparison  for  the  computer  to 
execute,  it  is  immediately  suggested  to  one  who  has  studied  some  commu- 
nications theory  that  a  correlation  technique  is  to  be  used. 
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Cross-correlation  is  defined  as  a  graph  of  the  similarity  between  two 
waveforms  as  a  function  of  the  time  shift  between  them.   However,  if 
cross-correlation  is  considered  as  a  matched  filtering  process,  which 
it  is,  then  the  uselessness  of  this  method  in  this  work  becomes  appar- 
ent. 

When  a  signal  is  cross-correlated  with  another,  it  is  equivalent 
to  an  autocorrelation  of  that  signal  with  itself  plus  noise.   The 
effect  is  that  the  process  acts  as  a  filter  and  only  allows  through 
those  frequencies  which  are  in  the  signal.   Thus,  this  method  of  sig- 
nal processing  is  very  powerful  where  you  have  a  high  frequency  signal 
buried  in  wideband  noise,  such  as  the  radar  problem,  but  of  little  use 
when  the  signals  of  interest  are  bandlimited  to  below  300  Hz.   As  a 
comparator,  correlation  gives  an  average  measure  of  the  similarity  be- 
tween two  waveforms.   It  is  quite  insensitive  to  local  differences  in 
the  amplitudes  of  the  two  waveforms.   Since  local  differences  of  the 
analog  waves  generated  are  the  precise  means  by  which  I  have  attempted 
to  perform  machine  recognition,  correlation  techniques  would  not  work. 

The  poor  performance  of  correlation  in  a  low  frequency  problem  has 
been  experimentally  shown  by  W.  Bezdel  in  his  paper  regarding  recogni- 
tion of  vowels  by  computer  program  using  zero-crossing  data  (3)„   He 
noted  poor  results  using  correlation  methods,  although  he  does  not 
explain  why.   A  little  thought  on  the  matter  makes  one  realize  it  would 
have  been  an  anomaly  if  his  results  had  been  good,  since  the  tool  has 
little  power  at  these  frequencies. 

Bezdel  and  Chandler  indicated  that  they  had  success  in  their  com- 
parison work  using  a  Euclidean  distance  measurement,  that  is,  a  point 
by  point  difference  calculation  between  corresponding  points  on  the 
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"unknown"  vowel  and  a  previously  stored  "dictionary"  vowel.   The 
dictionary  word  yielding  the  least  total  difference  from  the  unknown 
would  be  selected  as  the  best  comparison.   Such  a  method  seemed  of 
great  interest  to  this  researcher  since  it  was  simple  in  concept  and 
therefore  in  keeping  with  my  personal  philosophy  that  "if  whatever 
you  are  doing  is  complex  and  unwieldy,  it  is  probably  also  wrong." 
This  technique  was  also  readily  programable  within  the  limitations 
of  the  CDC  160  computer. 

The  scheme  employed  in  implementing  this  computer  comparison,  as 
well  as  a  discussion  of  the  CDC  160  computer  and  peripherals  used  is 
contained  below. 
9,    CDC  160  Computer  and  Algorithms. 

The  Control  Data  Corporation  160  computer  (see  Figure  8)  used  is 
a  parallel,  single  address  electronic  data  processor  controlled  by  an 
internally  stored  program  in  sequential  locations.   Memory  capacity  is 
4096,  12  bit  binary  words.   Instructions  are  executed  in  one  to  four 
storage  cycles,  with  the  time  varying  from  6.4  to  25.6  microseconds. 
Instructions  could  be  either  manually  inputed  via  finger  controls  on 
the  console  face  or  by  paper  punch  tape.   Data  can  be  inputed  as  above 
or  from  externally  selected  equipment,, 

The  CDC  163  magnetic  tape  unit  provides  the  capability  of  operat- 
ing with  many  more  than  the  4080  memory  cells  contained  in  the  computer 
by  allowing  you  to  dump  information  that  is  not  immediately  being  used 
onto  the  tape  and  thereby  freeing  more  memory  locations  for  use.   Tape 
stored  data  can  be  recalled  at  any  later  time. 
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Figure  8.   Block  diagram  of  CDC-160  computer  and  peripherals 
as  used  in  this  work. 


31 


The  DD-65  remote  display  unit  provides  a  rapid  means  of  trouble 
shooting  programs  in  assembly  language  prior  to  utilizing  a  library 
assembler  program  to  prepare  the  bioctal  tape  required  by  the  CDC  160. 
This  unit  was  also  used  to  display  the  error  versus  time  graphs  during 
the  comparisons  as  discussed  later. 

The  analog  to  digital  converter  utilized  is  a  non-commercially 
produced  unit,  having  been  constructed  here  at  the  Naval  Postgraduate 
School  by  the  Digital  Control  Laboratory  Personnel.   The  A/D  conversion 
unit  is  basically  a  multiplex  sampling  system  which  can  sample  one  in- 
put at  a  time  or  up  to  12  inputs  in  multiplex.   By  sampling  each  signal 
at  the  Nyquist  rate  (twice  the  highest  frequency  present)  qr  higher, 
the  digital  samples  will  contain  all  the  information  that  existed  in 
the  original  analog  waveform.   The  signals  sampled  here  were  limited 
to  below  500  Hz  so  a  sampling  rate  of  1  KHz  was  used  in  all  my  work. 
There  was  a  time  difference  of  approximately  100  mircoseconds  between 
corresponding  samples  of  the  two  inputs  due  to  the  multiplex  nature  of 
the  sampling  process.   This  error  could  be  reduced  somewhat  by  more 
judicious  programming,  but  the  error  was  not  considered  sufficiently 
serious  to  warrant  changes  at  this  time. 

In  order  to  make  a  Euclidean  comparison,  a  program  was  written  to 
perform  the  following  operations: 

1)   Have  the  computer  sit  waiting  for  a  0.2  volt  threshold  before 
commencing  requests  for  samples  from  the  A/D  converter.   When  a  signal 
greater  than  the  threshold  was  sensed,  the  computer  would  recognize 
that  there  was  a  word  being  inputed  and  would  receive  512  samples  each 
from  both  the  first  and  second  formant  channels.   The  number  512  was 
chosen  since  it  is  sufficient  to  cover  the  time  duration  of  the  words 
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zero  to  nine  used  in  this  work,  and  also  because  it  transforms  to 
1000  in  the  octal  number  system  used  by  the  computer.   The  DD-65 
display  unit  has  the  capacity  for  only  a  1000  point  plot  and  so  any 
larger  amount  of  sampling  would  spoil  the  use  of  this  analysis  tool, 

2)  Take  the  difference  of  the  second  formant  samples  minus  the 
first  formant  samples  with  the  appropriate  signs  and  store  them  on 
magnetic  tape  in  the  CDC  163. 

3)  Return  the  computer  to  thresholding  operation  until  another 
0.2  volt  signal  appears  at  the  input. 

After  the  numbers  zero  through  nine  have  been  stored  on  magnetic 
tape  by  the  program  described  above,  a  second  program  is  used  to  make 
the  necessary  comparisons  to  determine  what  unknown  is  being  spoken. 
This  program  works  as  follows: 

1)  The  computer  waits  for  a  0.2  volt  signal  as  above,  and  upon 
sensing  it  takes  samples  and  subtracts  the  first  formant  from  the  sec- 
ond, storing  the  difference  in  memory. 

2)  The  CDC  160  calls  for  the  first  stored  word  from  the  CDC  163 
magnetic  tape  unit  and  takes  the  absolute  value  of  the  difference  be- 
tween that  word  and  the  unknown  word,  sample  by  sample,  and  stores  this 
error  sum  in  memory. 

3)  Each  of  the  remaining  nine  words  stored  on  tape  are  subse- 
quently called  into  the  computer  for  comparison  and  their  individual 
error  sums  stored  in  memory. 

4)  When  all  ten  words  from  the  dictionary  have  been  compared  to 
the  unknown  word,  the  error  sums  are  compared,  and  the  word  that  has 
the  lowest  error  sum  is  taken  to  be  the  best  fit  to  the  unknown,  its 
corresponding  number  then  being  shown  in  the  register  of  the  CDC  160  as 
the  number  spoken. 
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If  this  monosyllabic  recognition  technique  proved  fruitful,  one 
could  extend  it  to  polysyllabic  words  by  some  method  of  presorting 
according  to  word  length  or  syllable  count.   Either  type  of  classifi- 
cation could  be  accomplished  without  much  additional  computer  memory 
requirements.   The  larger  vocabulary  could  also  be  stored  on  magne- 
tic tape  where  there  is  ample  storage  capacity. 

To  further  assist  in  the  analysis  of  the  comparison  work,  the 
DD-65  display  unit  was  programmed  into  the  process  to  display  a  plot 
of  the  error  versus  time  for  each  of  the  ten  comparisons  being  made. 
This  allows  one  to  see  on  a  real  time  display  which  words  were  close 
to  the  unknown  and  should  not  be,  as  well  as  which  part  of  the  cor- 
rect word  was  wrong  and  caused  the  comparison  not  to  be  satisfactory. 
Such  a  display  proved  very  beneficial  in  the  subsequent  work  of 
trying  to  adjust  the  circuits  used  to  improve  performance. 

The  computer  programs  described  above  are  contained  in  Appen- 
dix V. 
10.   Conclusions  and  Recommendations. 

The  numbers  zero  through  nine  were  recorded  by  five  male  speakers, 
Computer  recognitions  were  tried  using  each  of  the  voices  as  a  diction- 
ary.   Results  were  excellent  for  the  same  voice  against  itself,  as 
would  be  expected.   Attempts  at  inter-voice  comparisons  were  not  con- 
sistently successful  for  certain  of  the  numbers  as  discussed  below: 

As  can  be  seen  from  the  pictures  in  Appendix  IV,  certain  numbers 
are  very  unique  in  their  shape,  and  as  such  are  easy  to  match  for  many 
different  voices.   Examples  of  this  are  the  6,  7,  and  8.   It  was  pos- 
sible to  identify  these  numbers  with  any  one  of  the  five  voices  as  the 
dictionary. 
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The  remaining  numbers  are  different  enough  to  provide  good 
identification  if  the  speakers  speak  normally  and  clearly.   It  was 
noted  that  people  frequently  try  to  speak  very  clearly  (and  usually 
so  much  so  that  it  is  unnatural)  when  asked  to  speak  into  a  microphone 
for  testing.   This  presented  a  problem  in  identifying  the  number 
three.   Some  speakers  said  "th'-ree"  and  this  gives  a  different 
pattern  from  the  monosyllabic  version  of  the  word.   Aside  from  such 
anomalies,  results  were  very  good,  especially  if  each  speaker  heard 
how  the  others  pronounced  the  words. 

From  the  above  testing  it  was  apparent  that  some  work  would  be 
required  to  make  the  patterns  that  were  close  in  shape  more  unique,  so 
more  leeway  might  be  allowed  for  individual  speaker  mannerisms.   The 
input  bandwidth  was  varied  for  each  channel  with  the  hope  of  increasing 
the  differences  between  patterns.   Since  the  first  formant  is  expected 
to  exist  somewhere  below  1  KHz,  this  channel  was  bandlimited  between 
this  frequency  and  300  Hz.   The  second  formant  exists  somewhere  between 
800  and  4000  Hz,  and  so  this  channel  was  limited  to  this  frequency 
range.   Other  bands  were  tried  also,  but  these  settings  seemed  to  do  as 
well  or  better  than  any  others  and  are  reasonable  for  the  parameters 
being  extracted. 

The  output  low-pass  filters  were  also  varied,  with  cutoff  fre- 
quencies ranging  from  300  Hz  to  1000  Hz.  Optimal  settings  for  both 
channels  seemed  to  be  at  500  Hz. 

Under  these  new  conditions  results  obtained  for  the  same  five 
voices  as  above  were  improved,  but  problems  still  existed  for  the  num- 
bers 1,  4,  5,  and  0.   The  error  display  indicated  that  the  real  key  to 
discriminating  between  the  patterns  rests  on  the  substantial  excursions 
caused  by  the  plosives,  af fricatives  and  fricatives,  and  those  words 
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which  contain  none  are  inherently  in  trouble.   With  practice  all  the 
speakers  tested  began  saying  even  these  words  the  same  and  higher 
recognition  scores  were  realized.   In  this  regard  the  device  function-r- 
ed well  as  a  speech  training  aid  because  all  found  it  quite  easy  to 
enact  the  speaking  enunciation  of  the  best  speaker  and  to  obtain  his 
good  patterns,   For  those  people  who  speak  clearly  and  crisply,  results 
for  this  ten  word  vocabulary  would  be  very  good. 

After  testing  the  display  as  discussed  above,  it  was  evident  that 
this  scheme  as  it  now  stands  is  not  sufficient  for  dependable  computer 
recognition  of  speech.   There  is  enough  information  available  in  these 
patterns  to  render  them  far  from  valueless,  and  to  add  support  to  the 
theory  of  interconnection  of  the  zero-crossing  rate  and  formant  fre- 
quencies, but  more  information  is  needed  for  errorless  identification 
of  speech. 

Better  results  would  seem  possible  for  this  type  comparison  if  an 
average  of  several  voices  were  used  as  a  dictionary.   This  would  tend 
to  minimize  particular  voice  characteristics  and  accentuate  the  general 
sameness  of  the  words  being  spoken*   Time  did  not  permit  me  to  explore 
this  possibility.   It  is  recommended  to  anyone  who  wishes  to  pursue 
this  work  further. 

If  one  is  to  hold  the  theory  that  the  formants  contain  the  infor- 
mation, and  further,  that  the  zero-crossing  rate  is  a  measure  of  the 
formant  frequencies,  then  it  is  logical  to  say  that  sufficient  infor- 
mation is  available  here  for  error-free  speech  identification,  and  the 
problem  lies  in  the  way  this  information  is  being  handled.   It  was  not 
proposed  at  the  outset  of  this  work  that  an  Euclidean  comparison  of 
the  patterns  was  the  optimal  way  of  performing  recognition,  and  the 
results  tend  to  say  that  it  is  far  from  satisfactory.   Since  the  speech 
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signals  are  statistical  in  nature,  it  is  reasonable  to  expect  that  any 
comparison  system  that  does  not  allow  for  such  a  nature  is  not  going  to 
be  satisfactory.   It  is  proposed  by  this  researcher  that  statistical 
methods  be  employed  in  future  comparison  systems. 

The  addition  of  another  speech  parameter  to  produce  a  four 
dimensional  plot  (formant  one,  formant  two,  time  and  some  other  param- 
eter) is  also  worthy  of  consideration.   While  difficult  to  visualize, 
a  four  dimensional  plot  would  be  no  problem  to  implement  on  even  a 
computer  as  small  as  the  one  used  for  this  project. 

Satisfactory  performance  of  a  speech  recognition  scheme  for  even 
as  small  a  vocabulary  as  the  numbers  zero  through  nine  would  have 
possible  applications  today.   An  example  is  the  verification  of  credit 
card  validity  by  a  business.   It  has  been  my  experience  to  observe  that 
few  businesses  check  their  list  of  invalid  credit  cards,  save  for  the 
first  page  or  so,  obviously  taking  the  attitude  that  if  the  user  has 
not  run  out  as  soon  as  they  approach  the  list,  then  his  card  is  prob- 
ably good.   A  rapid  telephone  checking  system  could  be  realized  by  the 
businessman  calling  a  preassigned  number  which  would  connect  him  with 
the  computer  listing  of  lost  or  stolen  cards.   By  reading  off  the  num- 
bers he  could  have  very  rapid,  current  knowledge  of  the  status  of  the 
card  in  question.   Such  a  system  could  save  substantial  losses  that  are 
occurring  presently. 

In  conclusion,  it  seems  that  the  first  and  second  formant  analogs, 
as  extracted  from  the  speech  sounds  here,  are  a  worthy  measure  or  the 
intelligence  being  transferred,  and  more  work  in  their  processing  is 
warranted.   Using  the  computer  to  handle  the  pattern  comparisons,  and, 
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with  the  "error  display"  for  visual  monitoring  of  the  computer's  opera- 
tion, more  sophisticated  comparison  methods  should  improve  on  the 
results  obtained  thus  far. 


38 


BIBLIOGRAPHY 

1.  Ainsworth,  W. ,  "Relative  Intelligibility  of  Different  Transforms 
of  Clipped  Speech,"  Journal  of  the  Acoustical  Society  of  America, 
Vol.  41,  May  1967,  pp.  1272-1276. 

2.  Barrett,  N.  A. , "Extracting  Analogue  Signals  from  Noise  Using  a 
Digital  Computer,"  M.S.  Thesis,  Naval  Postgraduate  School,  Monterey, 
California,  May  1966. 

3.  Bezdel,  W.  and  Chandler,  H.  J.,  "Results  of  an  Analysis  and  Recog- 
nition of  Vowels  by  Computer  using  Zero-Crossing  Data,  "Proceed- 
ings of  the  Institute  of  Electrical  Engineers  (London),  Vol.  112, 
November  1965. 

\A.      Biddulph,  R. ,  "Short  Term  Autocorrelation  Analysis  and  Correla- 

tiongrams  of  Spoken  Digits,"  Journal  of  the  Acoustical  Society  of 
America,  Vol.  26,  July  1954,  pp.  539-541. 

5.   Chang,  S.,  Pihl,  G.  E. ,  and  Essigmann,  M.  W. ,  "Representations  of 
Speech  and  Sounds  and  Some  of  Their  Statistical  Properties," 
Proceedings  of  the  Institute  of  Radio  Engineers,  Vol.  39,  February 
1951,  pp.  147-153. 

v/6ff  Denes,  P.  and  Mathews,  M.  ,  "Spoken  Digit  Recognition  Using  Time- 
Frequency  Pattern  Matching,"  Journal  of  the  Acoustical  Society 
of  America,  Vol.  32,  November  1960,  pp.  1450-1455. 

7.  Dietrich,  W. ,  "Calculations  of  the  Effects  of  Peak  Clipping  on 
Speech-Like  Signals,"  M.S.. Thesis,  Naval  Postgraduate  School, 
Monterey,  California,  December  1966. 

8.  Fant,  C. ,   Acoustical  Theory  of  Speech  Production.    The  Hague: 
Mouton  and  Co. ,  1960. 

9.  Fairbanks,  G.,  Voice  and  Articulation  Drillbook  .  New  York,  Harper 

and  Rowe  Co. ,  1952. 

10.  Fletcher,  H. ,  Speech  and  Hearing  in  Communications.   New  York:   D. 
Van  Nostrand  Company,  1953. 

11.  Gold,  B.  and  Rader,  C,  "Systems  for  Compressing  the  Bandwidth  of 
Speech,"  IEEE  Transactions  on  Audio  and  Electroacoustics,  Vol.  AU- 
15,  September  1967,  pp.  131-136. 

\Xl.      Hollabaugh,  J.,  "Methods  for  Phonemic  Recognition  in  Speech  Pro- 
cessing," M.S.  Thesis,  Naval  Postgraduate  School,  Monterey,  Cali- 
fornia, 1963. 

13.   Huddy,  N.  W. ,  Jr.,  "An  Investigation  of  Methods  of  Improving  the 

Intelligibility  of  Audio  Frequency  Speech  in  Noise."  M.S.  Thesis, 
Naval  Postgraduate  School,  Monterey,  California,  October  1966. 


39 


14.  Hughes,  G.,  "The  Recognition  of  Speech  by  Machine,"  Technical 
Report  No.  395,  Research  Laboratory  of  Electronics,  MIT, 
Cambridge,  Massachusetts,  May  1961. 

15.  Kinsler,  L.  E.  and  Frey,  A.  R.  ,  Fundamentals  of .Acoustics. 
New  York:   John  Wiley  &  Sons,  Inc.  1950. 

16.  Licklider,  J.  and  Pollack,  I.,  "Effects  of  Differentiation, 
Integration,  and  Infinite  Peak  Clipping  on  the  Intelligibility 
of  Speech,"   Journal  of  the  Acoustical  Society  of  America,  Vol. 
20,  January  1948,  pp.  42-51. 

17.  Martin,  T. ,  Nelson,  A.,  and  Zadell,  H. ,  "Speech  Recognition  by 
Feature-Abstraction  Techniques,"  Technical  Documentary  Report  No. 
AL  TDR  64-176,  Radio  Corporation  of  America,  Camden,  New  Jersey, 
August  1964, 

18.  Olson,  H.  and  Belar,  H. ,  "Phonetic  Typewriter  III,"  Journal  of 
the  Acoustical  Society  of  America,  Vol.  33,  November  1961,  pp. 
1610-1615. 

19.  Olson,  H.  and  Belar,  H. ,  "Printout  System  for  the  Automatic  Rec- 
ording of  the  Spectral  Analysis  of  Spoken  Syllables,"  Journal  of 
the  Acoustical  Society  of  America,  Vol.  34,  February  1962,  pp. 
166-171. 

20.  Olson,  H.  ,  Belar,  H.  and  Rogers,  E. ,  "Speech  Processing  Techniques 
and  Applications,11  IEEE  Transactions  on  Audio  and  Electroacoustics, 
Vol.  AU-15,  September  1967,  pp.  120-126. 

21.  Potter,  R.  K. ,  Kopp,  G.  A.,  and  Kopp,  H.  G. ,   Visible  Speech. 
New  York,  Dover  Publications,  1966. 

22.  Pyron,  B.  and  Williams,  F. ,  Jr.,  "Signal  Processing  by  Infinite 
Clipping  and  Related  Techniques,"  Final  Report,  Project  A-727, 
U.S.  Government  Contract  DA  49-092-AR0-21 ,  Georgia  Institute  of 
Technology,  Atlanta,  Georgia,  April  1964. 

23.  Sakai,  T.  and  Doshita,  S.,  "The  Automatic  Speech  Recognition  System 
for  Conversational  Sound,"  IEEE  Transactions  on  Electronic  Comput- 
ers, Vol.  EC-12,  December  1963,  pp.  835-846. 

24.  Sakai,  T.  and  Inoue,  S.,  "New  Instruments  and  Methods  for  Speech 
Analysis,"  Journal  of  the  Acoustical  Society  of  America,  Vol.  32, 
April  1960,  pp.  441-450. 

25.  Sebestyen,  G.  S.,  Decision-Making  Processes  in  Pattern  Recognition, 
New  York:   Macmillan  Co.,  1962. 

26.  Sholtz,  P.  and  Bakis,  R. ,  "Spoken  Digit  Recognition  Using  Vowel- 
Consonant  Segmentation,"  Journal  of  the  Acoustical  Society  of 
America,  Vol.  34,  January  1962,  pp.  1-5. 


v 


40 


\/i 7 .   Shoup,  J.,  "Phoneme  Selection  for  Studies  in  Automatic  Speech 
Recognition,"  Journal  of  the  Acoustical  Society  of  America, 
Vol.  34,  April  1962,  pp.  397-403. 

28.  Teacher,  C,  Kellett,  H.  and  Focht,  L. ,  "Experimental,  Limited 
Vocabulary,  Speech  Recognizer,"  IEEE  Transactions  on  Audio  and 
Electroacoustics,  Vol.  AU-15,  September  1967,  pp.  127-130. 

29.  Wilde,  J.,  "A  Speech  Analysis  and  Compression  Scheme  for  Band- 
width Reduction,"  M.S.  Thesis,  Naval  Postgraduate  School, 
Monterey,  California,  June  1959. 

30.  Williams,  D. ,  "A  Visual  Display  of  Certain  Speech  Parameters," 
M.S.  Thesis,  Naval  Postgraduate  School,  Monterey,  California, 
1967. 


41 


APPENDIX  I 
PRESENTATION  AND  DISCUSSION  OF  PYRON  AND  WILLIAMSON'S  VECTOR  DISPLAY 

EQUIPMENT 
Figure  9  shows  a  block  diagram  of  the  circuitry  used  by  Pyron 
and  Williamson  (Ref.  22)  in  producing  their  "vector  display."   This 

circuitry  was  reproduced  in  the  course  of  this  study  (See  Section  6). 

s 

The  input  voice  sound  is  clipped,  amplified,  and  clipped  repeat- 
edly in  the  infinite  clipper  until  the  waveshape  is  virtually  rectan- 
gular.  A  Schmidt  trigger  completes  the  clipper  action,  yielding  a  wave 
whose  only  information  is  the  zero-crossing  points  of  the  original 
waveform.   This  signal  is  then  differentiated  to  give  sharp  pulses  at 
the  points  of  zero-crossing.   These  pulses,  in  turn,  trigger  the  mono- 
stable  multivibrator  and  a  rectangular  pulse  occurs  at  the  output  corre- 
sponding to  the  zero-crossing.   The  time  averaging  produces  a  slowly 
varying  analog  proportional  to  the  frequency  of  zero-crossing  of  the 
input.   If  the  input  is  differentiated  prior  to  processing,  the  analog 
will  be  proportional  to  the  rate  of  maxima  and  minima  that  occurred  in 
the  original  waveform. 

The  amplitude  analog  circuit  half-wave  rectifies  the  original 
waveform  and  forms  a  smoothed,  slowly  varying  output  proportional  to 
the  amplitude  of  the  original  speech  signal. 
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Figure  9.     Block  diagram  of  circuits  used  to  produce  the 
Vector  Display. 
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APPENDIX  II 
CIRCUITS  USED  TO  EXTRACT  FORMANTS  ONE  AND  TWO  FROM  THE  SPEECH  SIGNAL 

The  circuits  described  in  this  appendix  were  designed  to  extract 
from  a  speech  wave  analog  signals  which  are  proportional  to  the  first 
and  second  formants.   The  operation  of  the  second  formant  channel  is 
as  follows: 

After  the  emitter  follower  input  (see  Figure  10)  a  differentiator 
provides  six  db  per  octave  higher  frequency  preemphasis  to  bring  the 
weaker  second  formant  into  the  foreground.   The  differentiator  feeds 
into  a  common  base  transistor  used  to  provide  a  low  impedance  load, 
thereby  improving  the  differentiating  action.   Following  the  next 
emitter  follower  there  is  a  common  emitter  amplifier  stage  biased  near 
cut-off  to  provide  clipping  on  the  positive  side  of  the  signal.   The 
differentiator  and  diode  that  come  next  yield  pulses  at  each  positive 
going  zero-crossing  to  trigger  the  mono-stable  multivibrator  (Figure 
11).   The  monostable  output  pulses  are  then  averaged  by  the  low-pass 
filter  to  provide  a  voltage  analog  of  the  second  formant  frequency. 

The  first  formant  channel  operates  in  a  similar  manner,  with  the 
same  circuit,  except  for  the  omission  of  the  six  db  per  octave  higher 
frequency  preemphasis  of  the  first  differentiator. 
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APPENDIX  III 

The  patterns  contained  in  this  appendix  were  generated  with  the 
circuitry  discussed  in  Section  7.   They  are  typical  for  this  type 
processing,  and  very  subject  to  circuit  parameters  such  as  gain,  band- 
width and  degree  of  output  averaging  used.  All  shown  here  were 
generated  under  the  same  circuit  conditions. 

Each  photograph  contains  three  patterns  of  the  same  number  spoken 
by  three  different  male  speakers.   The  order  of  speakers  and  the 
direction  of  components  as  shown  below  are  the  same  for  all  the  pictures 
in  this  section. 


Number  1 
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TIME 


Formant  2 


Formant  1 
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Figure  12,   Formant  1  versus  formant  2  versus  time (diagonal) 
for  numbers  2  through  5. 
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Figure  13.   Formant  1  versus  forraant  2  versus  time, 
for  numbers  6  through  9. 


APPENDIX  IV 
The  patterns  shown  in  this  appendix  were  generated  by  the  circuits 
outlined  in  Appendix  II  and  discussed  in  Section  8,   All  were  made 
under  the  same  circuit  conditions  and  in  the  same  environment.   These 
are  the  type  patterns  that  were  used  in  attempting  recognition  of  speech 
by  computer  in  Sections  9  and  10, 
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Figure  1lu  Pormant  2  minus  formant  .1  versus  time. 
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Figure  15.   Pormant  2  minus  formant  1  versus  time. 
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Figure  16.      Pormant  2  minus  formant  1   versus  time. 
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APPENDIX  V 
COMPUTER  PROGRAMS 
1.   Program  DICTIONARY. 

This  program  is  designed  to  take  600  samples  from  the  inputs  to 
channels  1  and  2  on  the  analog  to  digital  converter,  subtract  1  from  2 
and  store  the  difference  on  magnetic  tape  for  future  use. 


Cell 

Contents 

Code 

0000 

0101 

PTA 

4071 

STD71 

2200 

LDC 

0600 

0600 

4070 

STD70 

7500 

EXF00 

1401 

1401 

7600 

INA 

0010 

4063 

STD63 

2463 

LCD63 

3642 

SBF42 

0020 


6704 


NJB04 


6206 

PJF06 

7500 

EXF00 

1401 

1401 

7600 

INA 

4063 

STD63 

2463 

LCD63 

4076 

STD76 

7500 

EXF00 

1402 

1402 

7600 

INA 

4063 

STD63 

2463 

LCD63 

Explanations 

Puts  zero  in  A  register 

Puts  zero  in  cell  71 

Loads  the  following  number  in  the  register 

Puts  600  in  cell  70 

Call  A/D  channel  1 

Input 

Store  sample  in  cell  63 

Load  complement  of  sample  in  the  a  register 

Subtract  threshold  value  stored  42  cells 
ahead 

If  threshold  not  exceeded  go  back  and 
sample  again 


Call  A/D  channel  2 
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Cell 

Contents 

Code 

0030 

3476 

SBD76 

4151 

STI51 

5617 

A0F17 

2620 

LCF20 

0601 

ADN01 

6501 

NZB01 

5471 

AOD71 

3470 

SBD70 

0040 

6717 

NJB17 

7500 

EXFOO 

2111 

2111 

7304 

Out 

3131 

3131 

2200 

LDC 

2000 

2000 

4202 

STF02 

0050 

7700 

HLT 

2000 

2000 

2000 

2000 

0071 

0071 

0122 

0122 

Explanation 

Channel  2  minus  channel  1 

Store  difference  in  address  contained 
in  cell  51 


Time  delay  for  proper  sample  spacing 

Go  back  and  take  next  samples 
Call  tape  unit  #1 


Last  word  address  plus  1  of  data  for  tape 
storage 


First  address  of  data  for  tape 
Initialize  for  next  run 
Halt 


Time  delay  constant 
0122  =  0.2v  threshold 

2.    Program  COMPARE. 

This  program  provides  for  a  comparison  of  an  unknown  word  with 
ten  previously  stored  words  on  tape,  and  an  error  versus  time  plot  on 
the  DD-65  display  unit  for  each  of  the  ten  comparisons.   At  the  com- 
pletion of  the  ten  comparisons  the  closest  comparing  word  location  on 
tape  is  stored  in  cell  51. 

Cell  Contents  .         Code 

0100  2200  LDC 

2000  2000 

4243  STF43 
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Cell 


Contents 


Code 


0110 


0120 


0130 


0140 


2200 
4000 
4202 
0400 
4077 
5701 
0277 
3647 
6705 
0512 
4067 
2200 
1130 
4070 
7500 
1401 
7600 
4063 
2463 
3600 
0122 
6705 
6205 
7500 
1401 
7600 
4063 
2463 
4076 
7500 
1402 
7600 
4063 


LDC 

4000 

STF02 

LDN00 

STD77 

AOB01 

LPN77 

SBF47 

NJB05 

LCN12 

STD67 

LDC 

1130 

STD70 

EXFOO 

1401 

INA 

STD63 

LCD63 

SBC 

0122 

NJB05 

PJF05 

EXFOO 

1401 

INA 

STD63 

LCD63 

STD76 

EXFOO 

1402 

INA 

STD63 
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Cell 


0150 


0160 


0170 


0200 


Contents 

2200 

2000 

4077 

2463 

3476 

4177 

5705 

2611 

0601 

6701 

0407 

6207 

0000 

0077 

7000 

0700 

0071 

0122 

5471 

3470 

6736 

2200 

5001 

4236 

2200 

5002 

4237 

2200 

5003 

4241 

2200 

5004 

4241 

0400 


Code 

LDF00 

2000 

STD77 

LCD63 

SBD76 

STI77 

AOB05 

LCF11 

ADN01 

NJB01 

LDN07 

PJF07 

0000 

0077 

7000 

0700 

0071 

0122 

A0D71 

SBD70 

NJB36 

LDC 

5001 

STF36 

LDC 

5002 

STF37 

LDC 

5003 

STF41 

LDC 

5004 

STF41 

LDNOO 
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Cell 


0210 


0220 


0230 


0240 


Contents 

4075 

7500 

2131 

7203 

4265 

6102 

3134 

2100 

2000 

3500 

3134 

6205 

2100 

3134 

3500 

2000 

7101 

6174 

0110 

5051 

2055 

1350 

0110 

5052 

2055 

0270 

0111 

0110 

5053 

2055 

0207 

5054 

5730 

5727 


Code 

STD75 

EXF00 

2131 

INP03 

4265 

NZF02 

3134 

LDM 

2000 

SBM 

3134 

PJF05 

LDM 

3134 

SBM 

2000 

JF101 

NZF74 

LS03 

RAD51 

LDD55 

LPB50 

LS03 

RAD52 

LDD55 

LPN70 

LS06 

LS03 

RAD53 

LDD55 

LPN07 

RAD54 

AOB30 

AOB27 
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Cell 
0250 


0260 


0270 


0300 


0310 


Contents 

5723 

5726 

5475 

3600 

1000 

6740 

2200 

2000 
4342 
4334 

2200 

3134 

4344 

4342 

0404 

5336 

0404 

5334 

0404 

5331 

0404 

5330 

2067 

7101 

7700 

5614 

0401 

5000 

2001 

6105 

2000 

4065 

5701 

5457 

0404 


Code 

A0B23 

A0B26 

AOD75 

SBC 

1000 

NJB40 

LDC 

2000 

STB42 

STB34 

LDC 

3134 

STB44 

STB42 

LDN04 

RAB36 

LDN04 

RAB34 

LDN04 

RAB31 

LDN04 

RAB30 

LDD67 

JFI01 

HLT 

A0F14 

LDN01 

RADOO 

LDD01 

NZF05 

LDDOO 

STD65 

A)B01 

AOD57 

LDN04 


59 


Cell 


0320 


0330 


0340 


0350 


Contents 

5307 

0411 

0701 

6615 

2200 

2001 

4315 

2200 

4065 

4315 

2200 

0701 

4312 

2065 

6142 

0401 

5051 

5622 

0401 

5053 

2001 

3405 

6312 

2302 

0277 

3200 

2000 

4307 

2053 

5051 

0400 

4053 

0404 

5314 


Code 

RAB07 

LDN11 

SBN01 

PJB15 

LDC 

2001 

STB15 

LDC 

4065 

STB15 

LDC 

0701 

STB12 

LDD65 

NZF42 

LDN01 

RAD51 

AOF22 

LDN01 

RAD53 

LDD01 

SBD05 

NJF12 

LDB02 

LPN77 

ADC 

2000 

STB07 

LDD53 

RAD51 

LDNOO 

STD53 

LDN04 

RAB14 


60 


Cell 


0360 


0370 


0400 


0410 


Contents 

0411 

0701 

6623 

2200 

0701 

4304 

2200 

3405 

4325 

2200 

2001 

4331 

0401 

6264 

2066 

6060 

5615 

2001 

6110 

2302 
0601 
0277 
4030 
2130 
4072 
5701 
0404 
5312 
0411 
0701 
6616 
2200 
407  2 
4311 


Code 

LDN11 

SBN01 

PJB23 

LDC 

0701 

STB04 

LDC 

3405 

STB25 

LDC 

2001 

STB31 

LDN01 

PJF64 

LDD66 

ZJF60 

AOF15 

LDD01 

NZF10 

LDB02 

ADN01 

LPN77 

STD30 

LDI30 

STD7  2 

A0B01 

LDN04 

RAB12 

LDN11 

SBN01 

PJB16 

LDC 

4072 

STB11 


61 


Cell 


0420 


0430 


0440 


0450 


Contents 

2200 

2001 

4323 

2200 

0701 

4312 

207  2 

3473 

6212 

2065 

4051 

0402 

3457 

6022 

2072 

3474 

6317 

6211 

2066 

4051 

0402 

3457 

6011 

2074 

3473 

6206 

2067 

4051 

6203 

2065 

4051 

7500 

1121 


Code 

LDC 

2001 

STB23 

LDC 

0701 

STB12 

LDD72 

SBD73 

PJF12 

LDD65 

STD51 

LDN02 

SBD57 

ZJF22 

LDD7  2 

SBD74 

NJF17 

PJF11 

LDD66 

STD51 

LDN02 

SBD57 

ZJF11 

LDD74 

SBD73 

PJF06 

LDD67 

STD51 

PJF03 

LDD65 

STD51 

EXF00 

1121 
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Cell 
0460 


0470 


0500 


Contents 

2051 

7101 

7600 

2066 

4051 

0402 

3457 

6011 

2074 

3473 

6206 

2067 

4051 

6203 

2065 

4051 

7500 

1121 

2051 

7700 


Code 

LDD51 

JFI01 

INA 

LDD66 

STD51 

LDN02 

SBD57 

ZJF11 

LDD74 

SBD73 

PJF06 

LDD67 

STD51 

PJF03 

LDD65 

STD51 

EXC 

LPI21 

LDD51 

HLT 
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